Working with Categorical Data

GVPT399F: Power, Politics, and Data

Working with categorical data

We often want to explore patterns in categorical (or discrete) data. We need new tools to do this.

Working with categorical data

select(mpg, manufacturer, model, drv)
# A tibble: 234 × 3
   manufacturer model      drv  
   <chr>        <chr>      <chr>
 1 audi         a4         f    
 2 audi         a4         f    
 3 audi         a4         f    
 4 audi         a4         f    
 5 audi         a4         f    
 6 audi         a4         f    
 7 audi         a4         f    
 8 audi         a4 quattro 4    
 9 audi         a4 quattro 4    
10 audi         a4 quattro 4    
# ℹ 224 more rows

Visualizing distributions

ggplot(mpg, aes(x = drv)) + 
  geom_bar()

Visualizing distributions

Reorder in relation to frequency

ggplot(mpg, aes(x = fct_infreq(drv))) +
  geom_bar()

Visualizing numeric variables

ggplot(mpg, aes(x = hwy)) +
  geom_histogram()

Visualizing numeric variables

ggplot(mpg, aes(x = hwy)) +
  geom_density()

Visualizing numeric variables

ggplot(mpg, aes(x = hwy, colour = drv, fill = drv)) +
  geom_density(alpha = 0.5)